Professor or Screaming Beast? Detecting Anomalous Words in Chinese
نویسندگان
چکیده
The Internet has become a very popular platform for communication around the world. However because most modern computer keyboards are Latin-based, Asian language speakers (such as Chinese) cannot input characters (Hanzi) directly with these keyboards. As a result, methods for representing Chinese characters using Latin alphabets were introduced. The most popular method among these is the Pinyin input system. Pinyin is also called ”Romanised” Chinese in that it phonetically resembles a Chinese character. Due to the highly ambiguous mapping from Pinyin to Chinese characters, word misuses can occur using standard computer keyboard, and more commonly so in internet chat-rooms or instant messengers where the language used is less formal. In this paper we aim to develop a system that can automatically identify such anomalies, whether they are simple typos intentional substitutions. After identifying them, the system should suggest the correct word to be used.
منابع مشابه
Design and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words
This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...
متن کاملFirst Language Activation during Second Language Lexical Processing in a Sentential Context
Lexicalization-patterns, the way words are mapped onto concepts, differ from one language to another. This study investigated the influence of first language (L1) lexicalization patterns on the processing of second language (L2) words in sentential contexts by both less proficient and more proficient Persian learners of English. The focus was on cases where two different senses of a polys...
متن کاملThe Effects of Phonological Neighborhoods on Spoken Word Recognition in Mandarin Chinese
Title of Document: THE EFFECTS OF PHONOLOGICAL NEIGHBORHOODS ON SPOKEN WORD RECOGNITION IN MANDARIN CHINESE Pei-Tzu Tsai, Master of Arts, 2007 Directed By: Professor Nan Bernstein Ratner Department of Hearing and Speech Sciences Associate Professor Rochelle Newman Department of Hearing and Speech Sciences Spoken word recognition is influenced by words similar to the target word with one phoneme...
متن کاملSeparation Between Anomalous Targets and Background Based on the Decomposition of Reduced Dimension Hyperspectral Image
The application of anomaly detection has been given a special place among the different processings of hyperspectral images. Nowadays, many of the methods only use background information to detect between anomaly pixels and background. Due to noise and the presence of anomaly pixels in the background, the assumption of the specific statistical distribution of the background, as well as the co...
متن کاملAnomaly Detecting Within Dynamic Chinese Chat Text
The problem in processing Chinese chat text originates from the anomalous characteristics and dynamic nature of such a text genre. That is, it uses ill-edited terms and anomalous writing styles in chat text, and the anomaly is created and discarded very quickly. To handle this problem, one solution is to re-train the recognizer periodically. This costs a lot of manpower in producing the timely ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008